fix(query): clarify condition resolution semantics for label queries#2994
fix(query): clarify condition resolution semantics for label queries#2994contrueCT wants to merge 18 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #2994 +/- ##
============================================
- Coverage 36.10% 35.32% -0.79%
- Complexity 338 499 +161
============================================
Files 803 803
Lines 68291 68381 +90
Branches 8970 8992 +22
============================================
- Hits 24656 24154 -502
- Misses 40990 41604 +614
+ Partials 2645 2623 -22 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
4c42786 to
cc9af24
Compare
There was a problem hiding this comment.
I found one correctness issue in the latest revision. The CI failures were posted separately as a PR-level reminder.
CI/status checks are failing on the latest head (cc9af24929e42af1c90e1f55f3e60adc351e0318). Could you check the failed jobs before the next review round?
Failed checks include:
build-server (memory, 11): https://github.com/apache/hugegraph/actions/runs/26448131941/job/77861497015
cc9af24 to
2e82f83
Compare
There was a problem hiding this comment.
I don't see a clear blocking correctness issue in the latest head, and the previous LABEL-resolution comments look addressed. One remaining merge risk is that the latest checks are still red: hstore failed in VertexCoreTest#testQueryByDateProperty.
Since this PR also touches HstoreStore, could you rerun or clarify whether the hstore failure is an existing flaky/environment issue?
Add explicit condition resolution APIs to ConditionQuery while preserving the legacy condition() behavior. Introduce containsCondition(Object), conditionValues(Object), and conditionValue(Object) so callers can distinguish missing, empty, unique, and multi-value results without overloading null semantics. Migrate LABEL-specific consumers in graph/index transactions, serializers, traversers, and stores to use the new APIs for unique-label resolution and conservative fallback behavior. Extend QueryTest and VertexCoreTest to cover absent, conflicting, and multi-value label conditions as well as collectMatchedIndexes() behavior for multi-label and conflicting label queries.
94408b7 to
b10e3c2
Compare
801923a to
ebc31c8
Compare
|
Thanks for your patience. The hstore CI failure exposed an existing latent issue in hstore's range-index query path. For range-index scans with limit/paging, the upper layer assumed that backend scan results were globally ordered by the range-index key and that the returned page state could be reused as a HugeGraph range cursor. In hstore, multi-node/tablet scans can return entries in backend iterator order, and the page state is an internal storage cursor, so those assumptions may lead to unstable ordering or skipped results. This PR keeps the fix intentionally scoped: hstore range-index queries whose visible result depends on limit/offset/paging are sorted and sliced in the index layer, while unbounded scans still use the original streaming path to avoid disturbing count, joint-index, and cleanup paths. I think this is enough for the current PR, but the underlying hstore scan/page-state contract should be handled in a dedicated follow-up, ideally by defining whether range scans must be globally ordered and fixing the hstore iterator/page-state semantics at the storage-client layer. |
imbajin
left a comment
There was a problem hiding this comment.
Blocking: yes. Summary: HStore range-index offset queries can skip too many sorted results. Evidence: static review of GraphIndexTransaction/query offset handling.
|
Thanks. I fixed this by resetting |
34f1fb1 to
23d4f94
Compare
23d4f94 to
9e24432
Compare
imbajin
left a comment
There was a problem hiding this comment.
Blocking: yes. Summary: HStore range-index ordering is still bypassed for finite non-paging and large-page scans. Evidence: static review plus local unit/store-client tests.
🔗 Latest-head CI/status still has visible failures; please check hstore, codecov/patch, and codecov/project: https://github.com/apache/hugegraph/actions/runs/27525154399/job/81350967135
| type, null, position); | ||
| } | ||
|
|
||
| static boolean shouldUseOrderedRangeScan(IdRangeQuery query) { |
There was a problem hiding this comment.
Evidence: GraphIndexTransaction.keepBackendIndexOrder() marks HStore range indexes as order-sensitive whenever the original query has a finite limit/offset, but this gate only enables the new globally ordered merge for paging && limit <= net.kv.scanner.page.size. The fallback below still calls session.scan(...), which reaches the legacy mergeRangeScanIterators() path in NodeTxSessionProxy; its existing test asserts partition/top-work order like 3,1,4,2, not global range-key order.
Impact: multi-partition HStore range-index queries can still return the wrong first n ids, skip the wrong ids for range()/offset, or page incorrectly when the requested page size is above the scanner page size. Please route all order-sensitive HStore range-index scans through the ordered merge, or make the graph layer fall back to a correctness-preserving full ordered scan before applying offset/limit.
imbajin
left a comment
There was a problem hiding this comment.
Requesting changes because the current HStore range-index fallback can turn bounded paged queries into full-range materialization in the graph layer. Please either move this to a bounded HStore ordered-merge path, or remove the fallback from this PR and leave it as a dedicated follow-up.
| ConditionQuery scanQuery = query.copy(); | ||
| scanQuery.page(null); | ||
| scanQuery.offset(0L); | ||
| scanQuery.limit(Query.NO_LIMIT); |
There was a problem hiding this comment.
This path is still risky because it preserves correctness by clearing the backend bounds and materializing the whole matched range before applying limit / offset / page in the graph layer.
Some Gremlin examples that can trigger the issue on HStore range indexes:
g.V().hasLabel('software').has('price', P.between(1, 1000000)).limit(10)
g.V().hasLabel('person').has('birth', P.between(start, end)).range(20, 40)
g.V().hasLabel('event').has('timestamp', P.gte(dayStart)).has('~page', '').limit(20)The user asks for a bounded page, but this fallback can behave like:
range-index query + limit/page
|
v
clear page + offset + limit
|
v
scan the full matched HStore range
|
v
sort everything in memory
|
v
slice the requested page
On a large HStore deployment, a query that appears to request only 10 or 20 rows can scan and sort a very large range under index-label read locks. That is a significant hidden performance risk and does not match the expected behavior of paged range queries.
I think the preferred fix is option B: keep the ordering fix in the HStore/store-client layer with a bounded ordered merge of partition iterators, then stop as soon as enough rows are collected for offset + limit or the requested page size:
partition iterator A
partition iterator B
partition iterator C
\ | /
ordered k-way merge
|
v
stop after the requested page is satisfied
If that is too large for this PR, please consider removing this graph-layer materialized fallback from the current PR, leave a clear FIXME near the HStore range scan path, and handle globally ordered HStore range-index paging in a separate follow-up. I don't think we should merge a correctness fix that silently turns bounded range-index queries into potential full-range scans.
imbajin
left a comment
There was a problem hiding this comment.
Blocking: no. Summary: No obvious issues found in the current head. Evidence: git diff --check; QueryTest/QueryResultsTest 12 tests passed; VertexCoreTest/EdgeCoreTest 438 tests passed; HstoreTableTest 2 tests passed; latest GitHub checks passed.
|
I took another look at the latest head and I don't think this should be treated as a generic Current head: Current PR state: So the failing GitHub Actions job is specifically Why I don't think this is just a slow runner: The job was not stuck in dependency download, compile, or startup. It entered The most relevant log evidence I found is the last useful stack before cancellation: There is also a Netty The suspicious overlap with this PR is high: My current hypothesis is that the HStore range-index paging cursor/page-state is not advancing correctly for this path. Conceptually: Given that this PR directly changes the HStore/query paging area, I don't think a rerun alone is enough evidence unless the same HStore range-index paging path is proven stable. I would not merge this head as-is. Please fix or revert the HStore range-index paging part, or move that piece into a dedicated follow-up where the storage-client page-state contract can be handled explicitly. At minimum, I think we need a targeted regression check around: The Codecov project failure is still visible too, but the |
imbajin
left a comment
There was a problem hiding this comment.
Requesting changes for now because the previous head exposed a blocking HStore CI issue and the current head has not yet proven that path is fixed.
The detailed analysis is here: #2994 (comment)
Important context: the PR head has moved to 057b390380552ec32494639bc7f7edcac08f043d, and its checks are still queued/in progress. I don't want to over-claim that the new head still reproduces the same timeout, but the previous failure was not a generic pd-store timeout: pd and store passed, while hstore hung in Run core test for almost 6 hours and the log path overlapped with this PR's HStore/index paging changes.
Please either fix/revert the HStore range-index paging path or wait for the new head to show that hstore passes cleanly with the relevant regression coverage.
Purpose of the PR
ConditionQuery.condition()currently mixes several different meanings in one API, including:This PR keeps the legacy
condition()behavior unchanged, adds explicit condition-resolution APIs, and migrates the high-riskLABELcall sites to use the clearer semantics.Main Changes
ConditionQuerycontainsCondition(Object key)conditionValues(Object key)conditionValue(Object key)condition()method backward-compatibleLABEL-related high-risk callers to the new APIs in:LABELlegacy usages in this first stepVerifying these changes
Added and extended regression coverage for the new semantics:
QueryTest#testConditionWithoutLabelQueryTest#testConditionWithEqAndInQueryTest#testConditionWithSingleInValuesQueryTest#testConditionWithConflictingEqAndInQueryTest#testConditionWithMultipleMatchedInValuesAdded a targeted regression for the label-index fallback path:
VertexCoreTest#testCollectMatchedIndexesByJointLabelsWithIndexedPropertiesThis test verifies:
Existing label-query regressions were also rechecked to ensure no behavior regression:
EdgeCoreTest#testQueryInEdgesOfVertexByLabelsEdgeCoreTest#testQueryInEdgesOfVertexByConflictingLabelsEdgeCoreTest#testQueryInEdgesOfVertexBySortkeyVertexCoreTest#testQueryByJointLabelsDoes this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need